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We develop a new estimation technique for recovering depth-of- 
field from multiple stereo images. Depth-of-field is estimated by de- 
termining the shift in image location resulting from different cam- 
era viewpoints. When this shift is not divisible by pixel width, the 
multiple stereo images can be combined to form a super-resolution 
image. By modeling this super-resolution image as a realization of 
a random field, one can view the recovery of depth as a likelihood 
estimation problem. We apply these modeling techniques to the re- 
covery of cloud height from multiple viewing angles provided by the 
M1SR instrument on the Terra Satellite. Our efforts are focused on 
a two layer cloud ensemble where both layers are relatively planar, 
the bottom layer is optically thick and textured, and the top layer 
is optically thin. Our results demonstrate that with relative ease, we 
get comparable estimates to the M2 stereo matcher which is the same 
algorithm used in the current MISR standard product (details can be 
found in [IEEE Transactions on Geoscience and Remote Sensing 40 
(2002) 1547-1559]). Moreover, our techniques provide the possibility 
of modeling all of the MISR data in a unified way for cloud height 
estimation. Research is underway to extend this framework for fast, 
quality global estimates of cloud height. 

1. Introduction. The motivation for this paper comes from the problem 
of recovering cloud height from the Multi- Angle Imaging SpectroRadiometer 
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(MISR) instrument, launched in December 1999 on the NASA EOS Terra 
Satellite. Clouds play a major role in determining the Earth's energy bud- 
get. As a result, monitoring and characterizing the distribution of clouds 
becomes important in global studies of climate. The MISR instrument pro- 
duces images (275 m resolution) in the red band over a swath width of 360 
km for nine camera angles: 70°, 60°, 45.6°, 26.1° forward angles, a nadir 
view at 0°, and aft angles 26.1°, 45.6°, 60°, 70° (referred to as Df, Cf, Bf, 
Af, An, Aa, Ba, Ca and Da respectively). By taking advantage of the image 
displacement that results from multi-angle image geometry, one can recover 
cloud top height and cloud motion (where cloud motion is determined from 
wind). Unfortunately, transparency, multiple layers, occlusion and height 
discontinuities present challenges for cloud height estimation. In this paper 
we apply new statistical techniques for estimating cloud height and attempt 
to recover cloud height for a two layer ensemble: an optically thin top layer 
over a textured bottom layer. 

The image displacement that results from ground registration is, almost 
always, not divisible by the pixel width. By taking advantage of this offset, 
one can construct a super-resolution image from the different viewing angles. 
It is this super-resolution image that we model as a discrete sample from a 
realization of a continuous Gaussian random field. Under this paradigm, es- 
timating height and wind can be viewed as a statistical likelihood estimation 
problem. There are several advantages of this approach. First, by changing 
the model of the latent continuous image, one can change the matching 
characteristics of the algorithm and, potentially, optimize the matching for 
different cloud ensembles. Second, the super-resolution framework extends 
naturally when there are more than two camera angles. Finally, the modeling 
of the super-resolution image gives a unified way of estimating sub-grid-scale 
displacement. 

There is a considerable amount of existing literature on both the recovery 
of three dimensional structure from multiple stereo images and constructing 
super-resolution images from multiple stereo images. The literature on both 
problems is vast and spans over at least 20 years. We refer readers to two re- 
views, Brown, Burschka and Hager (2003) and Dhond and Aggarwal (1989), 
on the recovery of depth-of-field and two reviews, Farsiu et al. (2004) and 
Park, Park and Kang (2003), on the super-resolution problem. To the au- 
thors' knowledge, the two problems have yet to be considered concurrently 
for recovery of depth-of-field which is the focus of the current paper. 

The rest of the paper is organized as follows. In Section 2 we describe our 
new technique for estimating depth-of-field from multiple images taken from 
different viewpoints. In Section 3 we show how to use this super-resolution 
framework for estimating cloud height from the MISR data. Finally, Section 
4 presents our test results for cloud height estimation. 
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2. The super-resolution likelihood. In this section we describe our tech- 
nique for recovering depth-of- field from multiple stereo images. We start with 
our notation for nonuniformly sampled images and finish with a presentation 
of our new estimation methodology that uses super-resolution techniques 
and random field models to define a likelihood for estimating depth-of-field. 
This exposition is done as general as possible to avoid letting the details 
of the cloud height problem obstruct the general estimation procedure (the 
details of the cloud height estimation are then presented in Section 3) . 

2.1. Image notation. Even though it is easy to visualize the construc- 
tion of a super-resolution image, the mathematical notation to express this 
construction is somewhat clumsy. The basic object for our notation is an 
image, denoted (x, y). The pixel locations are encoded in the ordered list of 
spatial coordinates x and the corresponding gray values, or radiances, are 
encoded in the vector y. In the regression setting, one might consider x to 
be a matrix with two columns where each row is a pixel location. However, 
we prefer to use the 'list' characterization so that, for example, a function 
/:1R 2 — ► R evaluated at a list of locations x, denoted /(x), will represent 
component- wise evaluation of / on each element of the list (rather than 
row- wise evaluation using the regression notation). In this way, shifting all 
the spatial locations in x by the same 5e1 2 can be written x + S. 

The pixel locations x allow us to define nonuniformly sampled images. 
Therefore, when working with one image, x serves to specify the overall 
structure of the image. When working with multiple images, the individ- 
ual pixel locations may be useful for comparing locations across multi- 
ple images. For example, if one has n images on a square grid of equal 
size, it may make sense to set the lower left pixel at the origin, all with 
the same pixel spacing. Multiple images will be written with superscripts 
(x^^yW), . . . , (x( n ),y( n )), reserving subscripts to denote the elements of a 
list. In particular, Xj and yj denotes the jth pixel and the corresponding jih 
gray value or radiance. For example, a 2 x 2 gray level image could be writ- 
ten x = ((0, 0), (0, 1), (1, 0), (1, 1)), y = (0.4,0.2,0.1,0.7) T so that the second 
pixel has coordinates X2 = (0, 1) with gray value y2 = 0.2. 

Now the super-resolution image is defined by direct concatenation of 
the pixel locations and the gray values. In particular, let (x",yW) and 
(x( 2 ),y( 2 )) be two images which we want to overlay to construct a super- 
resolution image. The intuition is that these two images are of the same 
physical object but projected on different pixel grids (see Figure 1). The 

super-resolution image is defined as ((^(2))> (y(2)))- ^ may be the situation 

that the first image (x^^y^ 1 )) needs to be shifted by some vector 5 G M 2 be- 
fore it is overlaid with the second image (x( 2 ),y( 2 )). The process of shifting 
the first image by d then overlaying the two to construct a super-resolution 

image is written ((^"j" 5 ), (y(2)))- 
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Fig. 1. An illustration of the construction of the super-resolution image (x,y). The object 
captured by (x' 1 - 1 , y' 1 ') shifts to (x*- 2 ^, ) in the next camera angle. If the parallax is 
known and is not a multiple of the grid spacing, the two image patches can be overlaid to 
create a super-resolution image. 



2.2. Super-resolution, random fields and depth- of- field. We start with n 
images, each representing different pictures of the same object taken from n 
different camera viewpoints (see the first illustration of Figure 2). Since the 
cameras are from different viewpoints, the object will appear in different 
image locations in each camera (see the second illustration of Figure 2). 
We define parallax as the difference in image location of the object in each 
camera. Notice that once the geometry of the camera configuration is fixed, 




Fig. 2. An illustration of the distance parameter d and parallax. Once the geometry of 
the camera array is fixed, the parameter d completely determines the parallax. 
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the parallax is completely characterized by the distance, d, of the object to 
one of the cameras. Typical estimation of d (i.e., depth-of- field) amounts to 
estimating the parallax of the object between each pair of cameras then using 
these parallaxes to solve for d. Our method, on the other hand, estimates 
d directly, treating it as an unknown statistical parameter. The advantage 
is that d completely characterizes the parallax observed between each pair 
of cameras, thereby reducing the problem to estimating a single parameter 
and modeling the observations jointly rather than pairwise. 

In more detail, consider a physical object captured in the image patch 
(x^^yW) and let d € (0, oo) denote the distance of this object to the first 
camera. As one varies d, there will exist different image patches (x^ 2 - 1 , y ^ ) , . . . 
( x ( n ),y( n )) from the subsequent camera angles, all of which capture the ob- 
ject appearing in (xW,yW). This implies the existence of shifts 62,..., 5 n 
so that the new super-resolution image, (x,y), given by 

( x(1) \ 



(1) 



x( 2 )+5 2 



W n) +<w 



y (1) 



y(») 



is a picture of the same object captured by (x^'j' 1 ') (see Figure 1). Note 
that the shifts 82, ■ ■ ■ , 8 n and the image patches (x( 2 ) , y ^ ) , . . . , ( X ( n ) ,yW) 
depend solely on the distance parameter d. We call the process of combining 
image patches to construct a super-resolution image "interlacing." 

To compute a likelihood for estimating the parameter d, we start by 
supposing there exists a latent continuous function Y : R 2 — ► M that mod- 
els the continuous image in the local patch (x^^yW). In other words, 
Y(x^) = yW. Note that Y(x) denotes the list obtained by evaluating Y 
at each pixel location in the list x. For the true distance parameter d, the 
super-resolution image (x,y) will also satisfy Y(x) = y. The wrong distance 
parameter will interlace images of different regions which will result in a 
'noisy' super-resolution image for which it will be difficult to find a contin- 
uous interpolator that satisfies Y(x) = y. By putting a probability measure 
P on the latent continuous image Y, we can estimate the distance of the 
object by minimizing the following negative log likelihood: 

(2) -*(d):=-logP(r(x)=y), 

where the super-resolution image (x,y), depending on d, is constructed as 
in (1). 

An equivalent way to specify the log-likelihood (2) is to simply claim 
that there exists image patches (x^ 2 ) , y^ 2 ^), . . . , (x^ n ) , y^ n ^) and location shifts 
62, ■ ■ ■ ,S n , all depending on the parameter d, such that Y(x^) = y^ and 

(3) yW=y(xW + y, k = 2,...,n. 
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Since Y does not depend on k in the above equation, the super-resolution 
image denned by (1) satisfies y = ^(x). Therefore, under the model P, the 
joint distribution of y is the same as that obtained in (2). This alternative 
way of expressing the super-resolution likelihood will become useful in the 
following sections for introducing nuisance parameters associated with the 
MISR cameras. 

3. Cloud height estimation. This section contains the modeling details 
for cloud height estimation from MISR satellite data. In our view, one of 
the advantages of the estimation techniques outlined in the previous section 
is the ease in which the technical details of a particular observation scenario 
can be incorporated into the recovery of depth-of-field. We demonstrate this 
flexibility in our application of this methodology to cloud height estimation. 
This section starts with a discussion of the relationship between parallax, 
cloud height and wind. Section 3.1 presents a summary of all the modeling 
assumptions used to compute the maximum likelihood estimation of height. 
A more thorough explanation of the modeling assumptions and the tech- 
niques for computing the likelihood can be found in Sections 3.1.1, 3.1.2 
and 3.1.3. 

For the MISR data, instead of estimating distance to the satellite, we 
want to estimate cloud height and wind. Since the parallax of a cloud patch 
is completely determined by height and wind, the techniques from Section 
2 are easily applicable. Indeed, one of the main features of the methodology 
developed in the last section is that it relates all the observed parallaxes to 
a single parameter d. In the MISR data, the parameter d, that is, distance 
of the object to the satellite, is directly related to cloud height (since the 
height of a cloud determines the distance to the satellite). However, there is 
an added complication of cloud movement as a function of wind. Since wind 
will also effect the image location of a cloud in each camera, we include 
it as a parameter so that now height h and a wind velocity v := (vi,V2) 
completely determine parallax and the construction of the super-resolution 
image (x,y). 

For the MISR images there is an approximate linear relationship that 
relates wind, cloud height and image parallax. Let denote the fly-over 
time delay (in seconds) for the kth camera and the along-track image 
location of a local cloud region in camera image k. When h is in meters, v is 
in meters per second, and the pixel locations specified by x( fc ) all represent 
the same 275 meter grid, the linear equation that relates wind, height and 
along-track parallax are 

(4) v 2 (t® - t^) + fc(tan(0W) - tan(0^)) = - x (j \ 

where 6^ k > is the angle of each camera. The across-track parallax is given by 
vi(t^ — For more details see Diner et al. (1997). 
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Before we continue, we mention that since clouds behave dynamically 
there is the possibility of a change in cloud structure within the time delay 
of the different cameras. However, since this delay is at most 7 minutes (the 
approximate delay from the Df camera to the Da camera) and the pixel 
patches we will be using correspond to about 4 square kilometers (15- by- 15 
pixel patches), we will tacitly assume this effect is negligible. 

3.1. Detailed summary of the model. A major obstacle in using the meth- 
ods in Section 2 for the MISR data is that the images from different angles 
often have different overall brightness. Therefore, it becomes difficult to di- 
rectly interlace patches from different images. In what follows we model this 
brightness change as a linear correction for each camera. 

Suppose the image patch (xW,yW) captures a small cloud region and 
let Y be the continuous representation of this image so that Y^xS 1 ') = y^. 
The true height and wind, (h, v), allow us to retrieve patches (x( 2 ),y( 2 )), . . . , 
(xW,y' n ') from the subsequent camera angles, each capturing the same 
cloud region projected on potentially shifted grids. For the remainder of the 
paper, n denotes the number of cameras and m denotes the number of pixels 
in each patch (xw,yw), k = 1, . . . ,n. Our model for the gray values y^ is 
summarized by 

(5) y (fc) = a k Y(^ + S k ) + A k ^ + b k 
for cameras k = 2, . . . , n, where 

• Y is the latent continuous cloud image, modeled by a Gaussian ran- 
dom field with Matern covariance function K, with parameters (a, p, v) = 
(1,4,4/3) (see Section 3.1.1) so that 

cov(Y(t),y(s))=/C(|t-s|), 

for all t,sGl 2 where | ■ | denotes Euclidean distance; 

• o~ k is a multiplicative brightness correction for each camera; 

• 5 k is a two dimensional shift vector which is completely determined by 
(h, v) and plays the role of shifting the pixel locations x^ 1 ) , . . . , x( n ) so 
that they are interlaced; 

• A k is a 2 x 1 matrix which represents an affine brightness correction for 
each patch; 

• b k is an overall constant additive correction. 

Notice that Y models the latent cloud image for each camera so we obtain a 
joint model for the full super-resolution vector y := (y^ T , . . . ,y^ n ' >T ) T ■ By 
the Gaussian assumption on Y, we have 

(6) y ( fc )~AA(^x( fc ) + 6 fc ,^ Sfc)) 
where S fc := (/C(|xf } - xf ) |)) ij . 
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Since we have introduced an additive and multiplicative brightness cor- 
rection on each patch, it is no longer true that the super-resolution image 
satisfies y = Y(pc), where x is defined by (1). What is true, however, is 
that y = diag(<TiI m , . . . , a n I m )Y(x.) + fi, where [i is a vector with /cth block 
A^x^ + bk and I m is the m x m identity matrix. The main difficulty for es- 
timation of (/j, v) now becomes devising techniques for dispensing with the 
nuisance parameters , Ak , bk ■ This is the main contribution of Sections 
3.1.2 and 3.1.3. Both sections use REML techniques [see Stein (1999)] to 
filter out the additive brightness corrections A^x^ + bk- Dealing with the 
multiplicative nuisance parameters Ufc is more difficult and requires separate 
treatments for high and low clouds. For high clouds, a brightness stabiliza- 
tion allows us to marginalize out o~k- For low clouds, there is no closed form 
solution for the marginalized likelihood and we resort to an approximate 
profile likelihood. 

Remark. The affine correction A^y^ + bk, rather than a higher order 
polynomial, was chosen for two reasons. First, the size of the pixel patch 
is relatively small. Small enough, in fact, so that one might expect a linear 
Taylor approximation to be valid. Second, a linear polynomial seems the 
natural choice for a once, but not twice, differentiable cloud process, which 
is derived in the following subsection. 

3.1.1. Gaussian random field model for P. In (5) we use a Matern auto- 
covariance function /C [with parameters (cr,p,v) = (1,4,4/3)] to model the 
covariance structure of the latent continuous cloud image Y. This section 
outlines our motivation for our choice of the Matern autocovariance function 
and the specific parameter values. The basic idea is to use the observed frac- 
tal behavior of clouds [see Cahalan and Snider (1989), Barker and Davies 
(1992), Varnai (2000), Oreopoulos et al. (2000) and Davis et al. (1997) from 
the atmospheric science literature] to model the fractal nature of Y after ad- 
justing for pixel averaging. Indeed, one of the advantages of the the Matern 
autocovariance function is the flexibility it provides for modeling the fractal 
behavior of Y. 

The Matern autocovariance function KL is defined by 



2"- 1 r(i/)v p J V p 

where fC u is the modified Bessel function [see Stein (1999)]. We model Y, 
the continuous cloud image, as a two dimensional Gaussian random field 
with covariance structure given by cov(Y(s),Y(t)) = )C(\s — t\) for all s,t € 
IR 2 with parameters (a,p,u) = (1,4,4/3). The parameter a is the variance 
var[y(t)] at any fixed point t € M. 2 . The parameter p serves as the range 
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parameter for Y so that cov (Y(s),Y(t)) quickly becomes negligible when 
the lag |s — 1\ is larger than p. The v parameter controls the smoothness of 
the process. 

In our analysis the range parameter p is set to 1100 meters (4 pixel widths) 
and the variance parameter a is 1. With the exception of either very small 
or very large p, different values of p, a do not severely effect our height 
estimates. In the case of very large p it seems that the local patch size 
becomes too small, compared to the range, for appropriate modeling. When 
p is very small, the estimation of height and wind becomes ineffective since 
most of the interlaced images are modeled as nearly uncorrelated. 

Here we derive our justification for v = 4/3 as a plausible value for the 
smoothness parameter. In our analysis, the vector represents the log of 
registered bidirectional reflectance factor (BRF) values from the kth camera. 
Let Ibrf denote the latent continuous BRF image so that Y = log((/j * /brf) , 
where the pixelation is represented by the convolution kernel <p = I[_i/2,i/2] 2 
for the indicator function I. Let /Cbrf be the autocovariance function for 
the continuous BRF image /brf- The two dimensional Fourier transform of 
/Cbrf gives the spectral density 

// B rf( w ) : = TTT-To / exp(-ix T uj)]CBRF(x)dx, 
\i-kY Jm.2 

where x = (x\,X2) an d w — (^1,^2) are both two dimensional vectors. There 
is evidence to suggest that Kolmogorov's 5/3 scaling law holds for // BRP 
[see Cahalan and Snider (1989), Barker and Davies (1992), Varnai (2000), 
Oreopoulos et al. (2000) and Davis et al. (1997)]. Under the additional as- 
sumption of isotropy, the scaling law implies the two dimensional spectral 
density satisfies fi BKF (w) x j^j573+i as M ~~ ¥ 00 • Therefore, the spectral den- 
sity of the pixelated process if * Ibrf is given by 



(7) U*i BBF ( u ) = I^MI 2 // B rf( w ) 



2 / U>\ \ . 2 / u; 2\ 



sine — sine 



2 J V 2 J |a;| 5 /3 



3+1 



as — * 00. Let fy denote the spectral density for the Matern autocovari- 
ance model for Y. We believe a plausible value for the parameter u, for the 
random field Y = log(</? * /brf) 5 corresponds to matching the power-law de- 
cay of fy to that of f(p*i BKF , in the coordinate directions. In particular, by 
(7) and the properties of the Matern spectral density, 

si^i/2) 1 

MbrfW-I 15/3+3 > frM 



1^)5/3+3 ' J*V~J~ | Wl 



2u+2 ■ 



fixing U2 and letting lo\ — > 00. Therefore, to match the decay, we set the 
smoothness parameter v to (5/3 + l)/2 = 4/3. 
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3.1.2. Low-cloud likelihood. Here we give some of the computational tech- 
niques for dealing with the nuisance parameters when estimating the height 
of low textured clouds. We start by constructing a matrix L ('L' for low) 
which filters out the dependence of the observations on and bf-. Then 
we construct an approximate profile likelihood to handle the multiplicative 
nuisance parameters o^. 

For low textured clouds we begin by constructing 11 matrices Z/i, • • • , L n 
which annihilate monomials of order at most 1 so that L^y^ ~ AA(0, cr^Efe). 
In particular, is a (m -3)xm matrix with rows composed of linearly inde- 
pendent vectors in the kernel space {v : [^(x^) <^>i(x( fc )) cf)2(^ k ')] T v T = 0}, 
where 4>q , 4>i and (p2 are the the monomials of order at most 1 . Now define 
the matrix L = diag(Li, . . . , L n ) so that Ly ~ AA(0, A ff SA ff ), where E := 
LEL T , E := (/C(|xj - Xj\))ij and A a := diag(cri/ m _3, . . . , a n L m _ 3 ). There- 
fore, the likelihood of the observation vector Ly, as it depends on (/i, v) and 
er := (a\, . . . ,a n ) T , can be written 



liLyfA-^A-^Ly) 



At this point there are a few options to dispense with the dependence of 
the above likelihood on the nuisance parameters a. The most desirable op- 
tion would be to marginalize out a by integrating J R „ £(h,v,cr\Ly)dcr. 

For even moderate block size m and n = 2, this becomes computationally 
formidable. The second option would be to maximize a profile likelihood 
C(h, v|Ly) := max^gr {C(h, v, <r|Ly)}. This option is somewhat more com- 
putationally tractable but still problematic since there is no closed form for 
the profile likelihood. The easiest option is to estimate on each patch sep- 
arately, a\ := (Lfcy( fc )) T (LfcS/ c L^) _1 (L/ c y( fe ))/m, then maximizing a plug-in 
version of the likelihood C(h, v, <r|Ly), where a .= . . . , cr n ) T . Notice, 
however, that under the correct (/i, v) the patches y*- 1 ) , . . . , y ^ are highly 
correlated, and, therefore, information is lost by estimating cifc separately on 
each patch. In an attempt to alleviate this problem, we explore a compro- 
mise between the full profile likelihood and the overly simplistic case of the 
plug-in likelihood with &. 

Notice that the quadratic term (Ly) T A (T 1 S _1 A (T 1 (Ly) can be written 



lE-^A-iLyf 



i^LxyW , , R n L n y^) 



0"! 



+ ••■ + 



where the matrices R\, . . . ,R n decompose E" 1 / 2 into block form so that 
E" 1 / 2 = (R\, . . . , R n ) and R is the matrix of inner products ((-RjLjyW, 
RjLjy^))ij . Therefore, the log likelihood can be written 

1 n 1 _T ~ 

£(h,v,cr\Ly) =c\ - -log|E| - (m - 3) ^ logo- fc - -a~ T Ra~ l , 
1 k=i 1 
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where c\ is a constant. For a fixed (h,v), maximizing £ over cr is a convex 
problem whose stationary point is characterized by 

(8) Ra~ l - (m - 3)<x = 0. 

This equation defines the profile likelihood but unfortunately has no closed 
form solution for general n. However, there is considerable research on inves- 
tigating iterative algorithms for solving (8) [see Marshall and Olkin (1968), 
Khachiyan and Kalantari (1992) and O'Leary (2003), for example]. The 
problem with iterative algorithms is that the likelihood computation will 
be done many times while searching through height and wind over slid- 
ing blocks. Therefore, including an iterative algorithm in the likelihood 
calculation presents some computational problems. As a compromise, we 
define a "one Newton step" estimate of the stationary point of (8) with an 
initial starting point cr, the maximum likelihood estimate 

J (Lfcy( fc )) T (L/ c SfcL^)~ 1 (Lfey( fc ))/m from each camera separately. 

The Newton step is constructed as in Khachiyan and Kalantari (1992) 
and defined for cr~ 1 rather than a. In particular, define F(cr~ l ) = Rcr~ l — 
(m-3)er and A CT := diag(oi, . . . ,a n ). Now F{a~ l +t) = F(cr' 1 ) + Rr + 
(m — 3) A^.t plus higher order terms in r. Setting this linear approximation 
to zero and solving for r gives the Newton step cr" 1 + r. We use the initial 
starting point cr" 1 . Notice the Newton step may result in a negative variance, 
in which case, the original estimate cr is used instead of the Newton step. 
In summary, to estimate (h,v), we define the "one Newton step" 

f cr" 1 + (R + (m - 3)A|)- 1 ((m - 3)A| - R)&-\ 
(<7"ncwton) 1 := I if positive; 

[ cr" 1 , otherwise, 

and maximize the plug-in log-likelihood 

(h,v) :=argmax£(/i,v,cr newton |Ly). 
(h,v) 

3.1.3. High-cloud likelihood. Here we give some of the computational 
techniques for dealing with the nuisance parameters when estimating the 
height of high thin clouds. Since these heights are notoriously hard to esti- 
mate, we attempt a brightness stabilization on the whole cloud image which 
allows more local information for estimating cloud height. After this sta- 
bilization, we construct a matrix H ( l H' is for high) which filters out the 
common value of the stabilized nuisance parameters Ak and 6^. For the 
brightness corrections cr^, our stabilization allows us to marginalize out the 
common value of ak under a uniform prior which is far more desirable than 
the approximate profile likelihood techniques needed for low clouds. 
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One of the difficulties of high clouds is that they are frequently opti- 
cally thin or partially transparent. This makes estimating parallax diffi- 
cult because the dominant signal is frequently the background rather than 
the cloud itself. In an attempt to overcome this difficulty, we estimate a 
linear transformation on y^ for each camera k = 1, .. . ,n to stabilize the 
brightness corrections modeled by a^^A^ and bk- A more in-depth discus- 
sion on this issue is presented in Section 4. Once stabilization is achieved, 
we have y ~ Af(Ax. + b, a 2 S), where a, A and b are the common values of 
(Tk,Ak and bk respectively and £ := (/C(|x.; — Xj|))«. This allows us to find 
a matrix H for which Hy ~ J\f(0, aHHH T ). H is defined as the matrix 
with rows composed of linearly independent vectors in the kernel space 
{v : [0o (x) 0i (x) 2 (x)] T ^ T = 0}. Notice that now H is not block diagonal 
as it was for L. As a consequence, Hy is a vector of length nm — 3 versus 
(to — 3)n (for the low cloud estimates), where n and m denote the number 
of cameras and patch size respectively. This provides more information for 
matching in an attempt at recovering the high-cloud heights. 

The likelihood of h, v and a, given the observation vector Hy, has the 
following form: 



C(h,v,*\Hy) = ^ m ' exp 



y T H T t~ 1 Hy 



2(J 2 



where £ := HHH T . In contrast with the low-cloud likelihood, we can remove 
the dependence on the unknown parameter a by marginalizing out under a 
uniform improper prior 



£(fc,v|fly)oc / £(fc,v,ff|£Ty)d<7 



2 (nm-4)/2 r((nm-4)/2) 
2|2vrS| 1 /2 [ y T#T£-l# y ](nm-4)/2' 



The integral is obtained by the change of variables x = a 1 and recognizing an 
un-normalized inverse gamma density. Therefore, the log-likelihood which is 
maximized over (h,v) is 

(9) £(h, V \Hy)=c 2 -|log|S| - ^^log[y T H T t^Hy], 

for some constant c 2 . 



Remark. Instead of marginalizing out the unknown a, one could in- 
stead maximize a profile log- likelihood as a function of (h, v), where the 
profiling is over the unknown ex. The resulting profile log-likelihood is C3 — 
|log|S| — ra ™~ 3 log[y r ff T S~ 1 ffy] which, it appears, fails to take into ac- 
count the loss of degrees of freedom associated with the unknown a. 
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3.1.4. Simulation study. Before our methodology is applied to the MISR 
data, we present a short simulation study that provides some evidence that 
the approximations used in the above methodology are not only valid in some 
cases but can also provide an improvement over the M2 algorithm. Although 
our simulations provide favorable evidence for our methodology, we believe 
the real strength lies in its flexibility to incorporate statistical principles 
and science to the problem of cloud height retrieval. Before we continue, we 
stress that this simulation is designed to test the matching algorithm when 
the probabilistic mechanism that generates the cloud images is known. In 
general, this will not be the case. Therefore, we consider simulations of this 
type to be one component of a full investigation of the potential for the 
super-resolution methodology. 

We have three main objectives for our simulations. The first is to show 
that one can obtain good cloud-height estimates by correctly specifying the 
high frequency behavior of the spectral density (as we do in Section 3.1.1) 
even when the the low frequency behavior is mis-specified. Second, we will 
argue that the joint modeling of the full super-resolution image, rather than 
pairwise modeling for each camera, gives better estimates. Finally, we pro- 
vide evidence that our method can outperform the M2 algorithm. 

The simulation is designed to mimic the situation where a small cloud 
patch appears in two larger images of a single cloud scene projected onto 
different grids. The goal is to then estimate the location of that cloud patch 
in the two larger images. To this end, we simulated a random field on a 
narrow two dimensional strip in [0,6/500] x [0,1] with spacing 1/500 in 
the y direction (i.e., vertical) and spacing 3/500 in the x direction. Three 
images were extracted by sub-sampling every third pixel in the y direction. 
Therefore, each image is of the same realization but projected on shifted 
grids. The last strip is then discarded with the exception of a 3 by 4 pixel 
patch with the lower left corner at (0,0.5040). The object is then to find the 
location of the patch, measured along the y-axis, in the other two images. 
To model the situation where there is a lower dimensional parameter [as in 
(4) for the cloud examples] that reduces the search space of patch locations, 
we constructed a latent parameter d € [0,1] that relates to the two image 
locations by y± = d and yi = 1.9(0.5040) — 0.9<i, where y± and 2/2 denote 
the unknown locations of the patch in the two images. Notice that this 
sub-manifold contains the true patch locations (i.e., yi =y2 = 0.5040 when 
d = 0.5040). Finally, the second strip was multiplied by 10 and the patch 
(from the third strip) was multiplied by 5 to model an unknown change in 
brightness across cameras. 

The model used in the simulations is a mean zero Gaussian random field 
with covariance given by 

cov(Y(s),Y(t)) = a 2 |10(s-t)| 8 / 3 + £ c a (s)t a 

o<H<i 
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0<|a|<l 0<|a|,|/3|<l 

where a and f3 are multi-indices (so that t a and s^ 3 are monomials), c a are 
functions, b a ^p are constants, and a = 15 [see page 250 of Light and Wayne 
(1998) for the exact construction of c a and 6o, i( g or page 256 of Chiles and 
Delfiner (1999) for a general discussion on the above covariance function]. 
This model is useful since it has the same high frequency behavior of the 
Matern spectral density when v = 4/3 (which is the model used in the likeli- 
hood computation) but a different low frequency behavior. In particular, the 
random field used in the simulation has a generalized autocovariance pro- 
portional to |t| 8 / 3 and thus a generalized spectral density proportional to 
|tc? |— 8/3— 2 |- gee Section 2.9 of Stein (1999)]. In contrast, the spectral density 
of the Matern autocovariance function used in Section 3.1.1 is proportional 
to (\uj\ 2 + 1/3) -4 / 3-1 and therefore has the same power decay at infinity but 
without the singularity at the origin. 

The above simulation was repeated 500 times, estimating the parameter d 
(the true value at d = 0.5040) on each realization using 5 different matching 
metrics. Table 1 reports the results where the rows are ordered from top to 
bottom by the root mean squared error (RMSE), smallest to largest. The 
method with the smallest RMSE, called 'full likelihood,' uses the techniques 
from Section 3.1.2. The second best method uses the same likelihood tech- 
nique but only pairwise, one for each image. The fact that the full likelihood 
technique has a smaller RMSE than the pairwise likelihood provides evi- 
dence that there is precision gained by fully modeling the joint distribution 
of the super-resolution image. The next method, 'no Newton update,' is the 
same as the full likelihood technique but does not update the scale estimates 
by the Newton step developed in Section 3.1.2. The method 'M2' uses the 
matching metric in equation (1) of Muller et al. (2002) for the MISR soft- 
ware. The fact that the full likelihood outperforms these techniques provides 

Table 1 

The simulation results (500 independent realizations) for estimating the latent parameter 

d (the true value is 0.5040,) which determines the location of a simulated patch in two 
images. The left column indexes the different matching methods. The middle column lists 
the average estimate. The right column lists the root mean sguare error (RMSE) 



Method 


Average estimate 


RMSE 


Full likelihood 


0.50398 


2.8460 x 10~ 4 


Pairwise likelihood 


0.50394 


5.1575 x 10~ 4 


No Newton update 


0.50431 


8.1279 x 10~ 3 


M2 


0.50018 


1.5620 x 10~ 2 


Wrong v 


0.50292 


3.5071 x 10" 2 
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evidence that by modeling the probabilistic mechanism which generates the 
images one can improve the height estimates beyond the M2 algorithm. We 
also reiterate that the full likelihood only correctly specifies the asymptotic 
decay of the spectral density, and not the low frequency behavior. Therefore, 
even with this model mis-specification one still gets an improvement over 
M2. On the other hand, the worst method, 'wrong z/,' uses a Matern auto- 
covariance function with v = 2/3 (the correct value should be v = 4/3) to 
model the second order increments of the super-resolution image, and hence 
incorrectly specifying the high frequency behavior. The fact that using the 
wrong v provides no improvement suggests that it is more important to 
correctly specify the high order frequency behavior of the spectral density 
rather than the low-order frequency. 

Remark. The phenomenon seen in the above simulation, that the high 
frequency behavior of the spectral density is important for cloud height 
estimation, can be interpreted in the context of Stein's work on asymptoti- 
cally optimal interpolation with a mis-specified autocovariance function [see 
Stein (1988), Stein (1990) or Chapter 3 of Stein (1999), for example]. In this 
work it is noted that in many situations the low frequency behavior of the 
spectral density has little effect on the variance of the interpolation errors. 
These results are immediately applicable to cloud height estimation since 
constructing the super-resolution image can be viewed as an interpolation 
problem. In fact, this can also explain the observation noted in Section 3.1.1 
that different values of p do not severely effect the height estimates. This 
comes from the observation that fixing p and estimating a from increments 
of the data (as in Sections 3.1.2 and 3.1.3) can be considered to be a data 
driven way to approximate the correct coefficient on the asymptotic decay 
of the true spectral density. 

4. Test results for cloud height retrieval. This section shows some test 
results for estimating cloud height in a multi-layer cloud region. The test 
region is a cropped image from a MISR swath corresponding to orbit num- 
ber 029145, path 031 and MISR blocks 56-67. Using the conventional grid, 
the upper left corner is located (row, column) = (801,1701) and the lower 
right corner is located (row, column) = (1400,2100) for a total image size 
of 600 x 400. Figure 3 shows the cloud images from the Bf and Cf cam- 
eras. This particular region was selected because of the clear nature of the 
two layer cloud ensemble. Near the right edge of the image, the region is 
dominated by optically thin, high clouds. The left edge contains mostly low, 
textured clouds. As one moves from left to right, the high clouds become 
more optically thick and the low clouds become sparse. This makes the re- 
gion particularly useful for testing the ability to estimate the height of two 
distinct cloud layers. 
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100 200 300 log(BRF) 100 200 300 log(BRF) 

pixel column pixel column 



Fig. 3. Log BRF from the Bf camera (left) and Cf camera (right). 

In addition to the two layer structure, there is evidence to suggest minimal 
wind in both the along-track and across-track direction. This allows one to 
only estimate the height parameter and set the wind vectors to zero. In 
all of the test cases, the interlacing likelihood is maximized over a grid 
with resolution 100 meters over the interval (0,3 x 10 ) meters. All of the 
following height estimates are based on 15-by-16 sliding local patches, with 
the exception of the 15-by-15 patch size used in the MISR software estimates 
and the likelihood profiles found in Figure 7. In particular, for each 15-by-16 
local patch in the Bf camera, a separate likelihood is used to compute a cloud 
height estimate which is then associated with the midpoint of the patch. The 
MISR estimates use an implementation of the M2 stereo matcher which is 
the same algorithm used in the current MISR standard product [details can 
be found in Muller et al. (2002)]. 

Figure 4 shows the cloud-height estimates based on the low-cloud likeli- 
hood (top row) and the high-cloud likelihood (bottom row) using two cam- 
eras (left column) and three cameras (middle column). The images in the 
right column correspond to the differences of the height estimates from three 
cameras and two cameras. Figure 5 shows two of our most interesting esti- 
mates of height (utilizing different camera angles and the different likelihood 
techniques) along with the height estimates from MISR's M2 algorithm. The 
left plot in Figure 5 shows the high-cloud estimates based on the three cam- 
eras Bf, Cf and Df. The middle plot shows the low-cloud estimates based 
on the three cameras Aa, An and Af. The right plot shows the MISR's M2 
estimates based on two cameras Bf and Cf. Compared with MISR's M2 soft- 
ware, one can see that the coverage (in other words, the percentage of good 
height estimates) is comparable. It is also apparent that the three estimates 
recover different components of the cloud layers. The most striking differ- 
ence is the amount of high clouds recovered in the left plot compared to 
the other two. Presumably, a major factor in this recovery is the brightness 
stabilization across cameras. 
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100 200 300 meters 100 200 300 meters 100 200 300 meters 
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Fig. 4. Upper left: Low cloud-height estimates using the two cameras Bf and Cf. Upper 
middle: Low cloud-height estimates using the three cameras Bf, Cf and Df. Upper right: 
Difference of the low cloud-height estimates using two cameras versus three cameras. Lower 
left: High cloud-height estimates using the two cameras Bf and Cf. Lower middle: High 
cloud-height estimates using the three cameras Bf, Cf and Df. Lower right: Difference of 
the high cloud-height estimates using two cameras versus three cameras. 

Figure 6 shows the mean and variance, taken over each column of the 
transformed images, from cameras Bf, Cf and Df. The stabilizing transfor- 
mations are found by visually matching the means and variances of the high 
clouds near the right edge of the test image. From these plots it seems that, 
at least for the test region, the high cloud region can indeed be stabilized as 
seen by similarity of the first two moments near the right edge, where the 
high clouds dominate. There is some reason to believe that such a stabilizing 
transformation exists for general cloud images. The radiance contribution 




100 200 300 meters 100 200 300 meters 100 200 300 meters 

pixel column pixel column pixel cclumn 



Fig. 5. Left: High cloud-height estimates from three cameras Bf, Cf and Df. Middle: Low 
cloud-height estimates from the three cameras Aa, An and Af. Right: Height estimates 
based on the MISR 's M2 stereo matcher using cameras Bf and Cf. 
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Fig. 6. Left: Variances of each column in the transformed images y' 1 ', 1.14y' 2 ' +0.03 
and 1.3y^ +0.09 (where y^ 1 ' , y^ 2 \ y' 3 ^ denote the gray values from cameras Bf, Cf and 
Df respectively) . Right: Mean of each column in the transformed imagey m , 1.14y (2) +0.03 
and 1.3y' 3 ' +0.09. These stabilized gray values are used for the high-cloud estimates. 



from the high clouds is almost exclusively due to initial scattering, whereas 
the radiance contribution from the low clouds must pass though the high 
cloud region, getting a reduction in both mean and variance depending on 
the optical thickness of the top layer. It is not so clear, however, whether 
the stabilizing transformation can be estimated from a multi-layer scene. In 
our test case we can take advantage of the fact that high clouds dominate 
the right hand edge and, therefore, the overall change in mean and variance 
can be estimated. 

Remark. One possible way to automate the estimation of the stabilizing 
transformation is to use classification algorithms to find regions dominated 
mostly by high clouds, estimate the transformation on these regions, then 
extrapolate to the whole multi-layer scene. Indeed, there is existing literature 
on cloud detection which could potentially be adapted for finding these high 
cloud dominated regions in a multi-layer scene [see Shi et al. (2008)]. 

To investigate the difference between the the low-cloud likelihood and 
the high-cloud likelihood, Figure 7 shows the log-likelihood profiles as a 
function of cloud height for two patches from our test scene (both patches 
constitute a 15 x 15 pixel region). The first patch is centered at row 500 and 
column 100 and is dominated by low textured clouds. The second patch is 
centered at row 400 and column 275 and constitutes a region with a two 
layer cloud ensemble. In Figure 7 the top row shows plots of the low-cloud 
likelihood (left/right plot corresponds to the low/multi-layer cloud patch). 
The bottom row corresponds to the high-cloud likelihood (left /right plot 
corresponds to the low/multi-layer cloud patch). The discontinuous nature of 
the graphs is due to the fact that a fixed interlacing was used and, therefore, 
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sub-grid-scale displacement can not be detected. The significance of these 
plots is that the likelihood profiles changes significantly when using the 
likelihood for the high-clouds (bottom row). The result is that the bottom 
layer height is estimated when using the low-cloud likelihood, and the top 
layer height is estimated when using the high-cloud likelihood. One other 
surprising feature of these likelihood profiles is multi-modality. We have three 
possible explanations for this phenomenon. First, as one searches through 
height space, the data from the extracted patches are different. Therefore, 
the likelihood is computed on different data values as one varies the cloud 
height parameter. Second, since the patch sizes are relatively small and the 
cloud images are textured, it is possible that there are multiple patches that 
"look the same" and hence yield a peak of the likelihood. Finally, we think 
that some of these peaks may correspond to multiple cloud layers in the test 
region. 

Summary. We present a new depth-of-field algorithm which uses random 
field models and the concept of super-resolution. By viewing the multi-angle 
cloud images as discrete sub-samples of a continuous random field, one can 
view depth-of-field estimation as a statistical parameter estimation problem. 
Under this paradigm, new tools become available for recovering depth-of- 
field from multiple stereo images and, in some cases, improve sensitivity 
and allow fine tuning for different observation scenarios. We apply these 
techniques to the recovery of cloud height using the MISR instrument on the 
Terra spacecraft. In our application, we attempted to demonstrate the ease 
in which technical details of the stereo cameras and the scientific properties 
of the observations can be incorporated in the estimates. The main focus of 
the test case was to use the special nature of our new estimator to recover 
the heights of a two layer cloud ensemble: an optically thin high cloud layer 
and an optically thick, textured low cloud. We have shown that the recovery 
of two separate layers is indeed possible and could potentially be automated 
for cloud-height estimates on a global scale. These results lay the foundation 
for future research on extending this framework for cloud height estimation 
on a global scale using all nine MISR cameras. Indeed, current research 
is underway to speed up the likelihood estimates and to incorporate more 
information on the observational properties of the MISR cameras. 
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Fig. 7. Profiles of the log likelihood as a function of height. Rows correspond to differ- 
ent likelihood models (top: low cloud model, bottom: high cloud model) and the columns 
correspond to different patches (left: low cloud patch, right: multi-layer cloud patch). 
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